Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
You're currently offline. Some features may not work.
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🎛️ CUDA Optimization
Kernel Tuning, Memory Access Patterns, Thread Configuration
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
121124
posts in
1.49
s
Fine-Tuning
GPT-5 for GPU
Kernel
Generation
arxiv.org
·
9h
🎯
GPU Kernels
AI in Multiple
GPUs
: Understanding the Host and Device
Paradigm
towardsdatascience.com
·
1h
⏱️
CUDA Events
Execution-Centric Characterization of
FP8
Matrix Cores, Asynchronous Execution, and Structured Sparsity on AMD
MI300A
arxiv.org
·
9h
🌊
CUDA Streams
Two Ways to Move
Tensors
Without Stopping: Inside
vLLM
's Async GPU Transfer Patterns
dev.to
·
17h
·
Discuss:
DEV
🌊
CUDA Streams
Discussion - Investigation of Single Thread CPU "
Thoughput/cycle
"
forums.anandtech.com
·
14h
📊
Profiling Tools
AI
Inference
Needs A
Mix-And-Match
Memory Strategy
semiengineering.com
·
6h
🎯
Tensor Cores
Show HN: GPU
ROI
simulator
based on token usage and model architecture
axiomos.ai
·
2d
·
Discuss:
Hacker News
📈
GPU Occupancy
The Efficiency Wall: Why the Next 1,000x
Leap
Isn’t More
GPUs
pub.towardsai.net
·
10h
🌊
CUDA Streams
Linux Kernel
Graphics
Driver Development Now
Experimenting
With AI Code Review
phoronix.com
·
17h
🔍
Nsight
Scaling llama.cpp On
Neoverse
N2: Solving
Cross-NUMA
Performance Issues
semiengineering.com
·
6h
📈
Occupancy Optimization
PC Optimization Tools
trendhunter.com
·
1d
📊
Profiling Tools
borodark/exmc
: Probabilistic programming in BEAM
github.com
·
17h
⚡
ONNX Runtime
CPU
cloth
simulation performance
comparable
to GPU SotA
sig25ddmpd.github.io
·
13h
·
Discuss:
Hacker News
✂️
CUTLASS
Hitting
1,000
tokens
per second on a single RTX 5090
blog.alpindale.net
·
3d
·
Discuss:
Hacker News
,
Hacker News
📈
Occupancy Optimization
Parallel Track Transformers:
Enabling
Fast GPU Inference with Reduced
Synchronization
machinelearning.apple.com
·
2d
⏱️
CUDA Events
Show HN: Solving
Sudoku
reasoning via Energy
Geometric
models
davisgeometric.com
·
4h
·
Discuss:
Hacker News
✂️
CUTLASS
OLIX
: Compute
Manifesto
olix.com
·
23h
·
Discuss:
Hacker News
⚡
CUDA Programming Patterns
How
JVM
Thread
Scheduling
Impacts Application Performance
dev.to
·
10h
·
Discuss:
DEV
⏱️
CUDA Events
Mesa
26.0 Released With
RADV
Ray Tracing Performance Gains
linuxiac.com
·
18h
🔧
PTX
How
Programmers
Spend
Their Time
probablydance.com
·
1d
·
Discuss:
Hacker News
⚡
Flash Attention
Loading...
Loading more...
Page 2 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help